Goto

Collaborating Authors

 phase transition


Phase transitions in Doi-Onsager, Noisy Transformer, and other multimodal models

Mun, Kyunghoo, Rosenzweig, Matthew

arXiv.org Machine Learning

We study phase transitions for repulsive-attractive mean-field free energies on the circle. For a $\frac{1}{n+1}$-periodic interaction whose Fourier coefficients satisfy a certain decay condition, we prove that the critical coupling strength $K_c$ coincides with the linear stability threshold $K_\#$ of the uniform distribution and that the phase transition is continuous, in the sense that the uniform distribution is the unique global minimizer at criticality. The proof is based on a sharp coercivity estimate for the free energy obtained from the constrained Lebedev--Milin inequality. We apply this result to three motivating models for which the exact value of the phase transition and its (dis)continuity in terms of the model parameters was not fully known. For the two-dimensional Doi--Onsager model $W(θ)=-|\sin(2πθ)|$, we prove that the phase transition is continuous at $K_c=K_\#=3π/4$. For the noisy transformer model $W_β(θ)=(e^{β\cos(2πθ)}-1)/β$, we identify the sharp threshold $β_*$ such that $K_c(β) = K_\#(β)$ and the phase transition is continuous for $β\leq β_*$, while $K_c(β) β_*$. We also obtain the corresponding sharp dichotomy for the noisy Hegselmann--Krause model $W_{R}(θ) = (R-2π|θ|)_{+}^2$ .


Phase transition on a context-sensitive random language model with short range interactions

Toji, Yuma, Takahashi, Jun, Roychowdhury, Vwani, Miyahara, Hideyuki

arXiv.org Machine Learning

Since the random language model was proposed by E. DeGiuli [Phys. Rev. Lett. 122, 128301], language models have been investigated intensively from the viewpoint of statistical mechanics. Recently, the existence of a Berezinskii--Kosterlitz--Thouless transition was numerically demonstrated in models with long-range interactions between symbols. In statistical mechanics, it has long been known that long-range interactions can induce phase transitions. Therefore, it has remained unclear whether phase transitions observed in language models originate from genuinely linguistic properties that are absent in conventional spin models. In this study, we construct a random language model with short-range interactions and numerically investigate its statistical properties. Our model belongs to the class of context-sensitive grammars in the Chomsky hierarchy and allows explicit reference to contexts. We find that a phase transition occurs even when the model refers only to contexts whose length remains constant with respect to the sentence length. This result indicates that finite-temperature phase transitions in language models are genuinely induced by the intrinsic nature of language, rather than by long-range interactions.



Thermodynamic Isomorphism of Transformers: A Lagrangian Approach to Attention Dynamics

Kim, Gunn

arXiv.org Machine Learning

We propose an effective field-theoretic framework for analyzing Transformer attention through a thermodynamic lens. By constructing a Lagrangian on the information manifold equipped with the Fisher metric, we show that, within the Shannon--Boltzmann entropy framework, the Softmax function arises as a stationary solution minimizing a Helmholtz free energy functional. This establishes a formal correspondence between scaled dot-product attention and canonical ensemble statistics. Extending this mapping to macroscopic observables, we define an effective specific heat associated with fluctuations of the attention energy landscape. In controlled experiments on the modular addition task ($p = 19$--$113$), we observe a robust peak in this fluctuation measure that consistently precedes the onset of generalization. While no asymptotic power-law divergence is detected in this finite-depth regime, the reproducible enhancement of energy variance suggests a critical-like crossover accompanying representational reorganization. Our framework provides a unified statistical-mechanical perspective on attention scaling, training dynamics, and positional encoding, interpreting the phenomena as emergent properties of an effective thermodynamic system rather than isolated heuristics. Although the present results indicate finite-size crossover behavior rather than a strict phase transition, they motivate further investigation into scaling limits of deep architectures through fluctuation-based observables.







The committee machine: Computational to statistical gaps in learning a two-layers neural network

Benjamin Aubin, Antoine Maillard, jean barbier, Florent Krzakala, Nicolas Macris, Lenka Zdeborová

Neural Information Processing Systems

Heuristic tools from statistical physics have been used in the past to locate the phase transitions and compute the optimal learning and generalization errors in the teacher-student scenario in multi-layer neural networks. In this contribution, we provide a rigorous justification of these approaches for a two-layers neural network model called the committee machine. We also introduce a version of the approximate message passing (AMP) algorithm for the committee machine that allows to perform optimal learning in polynomial time for a large set of parameters.